{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "deletable": true,
    "editable": true,
    "id": "4f3CKqFUqL2-",
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Custom Estimator"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "**Learning Objectives:**\n",
    "  * Use a custom estimator of the `Estimator` class in TensorFlow to predict median housing price"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "The data is based on 1990 census data from California. This data is at the city block level, so these features reflect the total number of rooms in that block, or the total number of people who live on that block, respectively.\n",
    "<p>\n",
    "Let's use a set of features to predict house value."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "deletable": true,
    "editable": true,
    "id": "6TjLjL9IU80G"
   },
   "source": [
    "## Set Up\n",
    "In this first cell, we'll load the necessary libraries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "import math\n",
    "import shutil\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import tensorflow as tf\n",
    "\n",
    "tf.logging.set_verbosity(tf.logging.INFO)\n",
    "pd.options.display.max_rows = 10\n",
    "pd.options.display.float_format = '{:.1f}'.format"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "deletable": true,
    "editable": true,
    "id": "ipRyUHjhU80Q"
   },
   "source": [
    "Next, we'll load our data set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "df = pd.read_csv(\"https://storage.googleapis.com/ml_universities/california_housing_train.csv\", sep = \",\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "deletable": true,
    "editable": true,
    "id": "HzzlSs3PtTmt",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "## Examine the data\n",
    "\n",
    "It's a good idea to get to know your data a little bit before you work with it.\n",
    "\n",
    "We'll print out a quick summary of a few useful statistics on each column.\n",
    "\n",
    "This will include things like mean, standard deviation, max, min, and various quantiles."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "df.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "This data is at the city block level, so these features reflect the total number of rooms in that block, or the total number of people who live on that block, respectively.  Let's create a different, more appropriate feature.  Because we are predicing the price of a single house, we should try to make all our features correspond to a single house as well"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "df['num_rooms'] = df['total_rooms'] / df['households']\n",
    "df['num_bedrooms'] = df['total_bedrooms'] / df['households']\n",
    "df['persons_per_house'] = df['population'] / df['households']\n",
    "df.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "df.drop(['total_rooms', 'total_bedrooms', 'population', 'households'], axis = 1, inplace = True)\n",
    "df.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "deletable": true,
    "editable": true,
    "id": "Lr6wYl2bt2Ep",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "## Build a custom estimator linear regressor\n",
    "\n",
    "In this exercise, we'll be trying to predict `median_house_value`. It will be our label. We'll use the remaining columns as our input features.\n",
    "\n",
    "To train our model, we'll use the Estimator API and create a custom estimator for linear regression.\n",
    "\n",
    "Note that we don't actually need a custom estimator for linear regression since there is a canned estimator for it, however we're keeping it simple so you can practice creating a custom estimator function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "# Define feature columns\n",
    "feature_columns = {\n",
    "  colname : tf.feature_column.numeric_column(colname) \\\n",
    "    for colname in ['housing_median_age','median_income','num_rooms','num_bedrooms','persons_per_house']\n",
    "}\n",
    "# Bucketize lat, lon so it's not so high-res; California is mostly N-S, so more lats than lons\n",
    "feature_columns['longitude'] = tf.feature_column.bucketized_column(tf.feature_column.numeric_column('longitude'), np.linspace(-124.3, -114.3, 5).tolist())\n",
    "feature_columns['latitude'] = tf.feature_column.bucketized_column(tf.feature_column.numeric_column('latitude'), np.linspace(32.5, 42, 10).tolist())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "# Split into train and eval and create input functions\n",
    "msk = np.random.rand(len(df)) < 0.8\n",
    "traindf = df[msk]\n",
    "evaldf = df[~msk]\n",
    "\n",
    "SCALE = 100000\n",
    "BATCH_SIZE=128\n",
    "train_input_fn = tf.estimator.inputs.pandas_input_fn(x = traindf[list(feature_columns.keys())],\n",
    "                                                    y = traindf[\"median_house_value\"] / SCALE,\n",
    "                                                    num_epochs = None,\n",
    "                                                    batch_size = BATCH_SIZE,\n",
    "                                                    shuffle = True)\n",
    "eval_input_fn = tf.estimator.inputs.pandas_input_fn(x = evaldf[list(feature_columns.keys())],\n",
    "                                                    y = evaldf[\"median_house_value\"] / SCALE,  # note the scaling\n",
    "                                                    num_epochs = 1, \n",
    "                                                    batch_size = len(evaldf), \n",
    "                                                    shuffle=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "# Create the custom estimator\n",
    "def custom_estimator(features, labels, mode, params):  \n",
    "  # 0. Extract data from feature columns\n",
    "  input_layer = tf.feature_column.input_layer(features, params['feature_columns'])\n",
    "  \n",
    "  # 1. Define Model Architecture\n",
    "  predictions = tf.layers.dense(input_layer,1,activation=None)\n",
    "  \n",
    "  # 2. Loss function, training/eval ops\n",
    "  if mode == tf.estimator.ModeKeys.TRAIN or mode == tf.estimator.ModeKeys.EVAL:\n",
    "    labels = tf.expand_dims(tf.cast(labels, dtype=tf.float32), -1)\n",
    "    loss = tf.losses.mean_squared_error(labels, predictions)\n",
    "    optimizer = tf.train.FtrlOptimizer(learning_rate=0.2)\n",
    "    train_op = optimizer.minimize(\n",
    "      loss = loss,\n",
    "      global_step = tf.train.get_global_step())\n",
    "    eval_metric_ops = {\n",
    "      \"rmse\": tf.metrics.root_mean_squared_error(labels*SCALE, predictions*SCALE)\n",
    "    }\n",
    "  else:\n",
    "    loss = None\n",
    "    train_op = None\n",
    "    eval_metric_ops = None\n",
    "  \n",
    "  # 3. Create predictions\n",
    "  predictions_dict = #TODO: create predictions dictionary\n",
    "  \n",
    "  # 4. Create export outputs\n",
    "  export_outputs = #TODO: create export_outputs dictionary\n",
    "  \n",
    "  # 5. Return EstimatorSpec\n",
    "  return tf.estimator.EstimatorSpec(\n",
    "      mode = mode,\n",
    "      predictions = predictions_dict,\n",
    "      loss = loss,\n",
    "      train_op = train_op,\n",
    "      eval_metric_ops = eval_metric_ops,\n",
    "      export_outputs = export_outputs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "# Create serving input function\n",
    "def serving_input_fn():\n",
    "  feature_placeholders = {\n",
    "      colname : tf.placeholder(tf.float32, [None]) for colname in 'housing_median_age,median_income,num_rooms,num_bedrooms,persons_per_house'.split(',')\n",
    "  }\n",
    "  feature_placeholders['longitude'] = tf.placeholder(tf.float32, [None])\n",
    "  feature_placeholders['latitude'] = tf.placeholder(tf.float32, [None])\n",
    "  \n",
    "  features = {\n",
    "    key: tf.expand_dims(tensor, -1)\n",
    "    for key, tensor in feature_placeholders.items()\n",
    "  }\n",
    "    \n",
    "  return tf.estimator.export.ServingInputReceiver(features, feature_placeholders)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "# Create custom estimator's train and evaluate function\n",
    "def train_and_evaluate(output_dir):\n",
    "  estimator = # TODO: Add estimator, make sure to add params={'feature_columns': list(feature_columns.values())} as an argument\n",
    "  \n",
    "  train_spec = tf.estimator.TrainSpec(\n",
    "    input_fn = train_input_fn,\n",
    "    max_steps = 1000)\n",
    "  exporter = tf.estimator.LatestExporter('exporter', serving_input_fn)\n",
    "  eval_spec = tf.estimator.EvalSpec(\n",
    "    input_fn = eval_input_fn,\n",
    "    steps = None,\n",
    "    exporters = exporter)\n",
    "  tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)\n",
    "\n",
    "#Run Training\n",
    "OUTDIR = 'custom_estimator_trained_model'\n",
    "shutil.rmtree(OUTDIR, ignore_errors = True) # start fresh each time\n",
    "train_and_evaluate(OUTDIR)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "source": [
    "## Challenge Excercise\n",
    "\n",
    "Modify the `custom_estimator` function to be a neural network with one hidden layer, instead of a linear regressor"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "default_view": {},
   "name": "first_steps_with_tensor_flow.ipynb",
   "provenance": [],
   "version": "0.3.2",
   "views": {}
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
